AITopics | normalization parameter

Collaborating Authors

normalization parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Neural Information Processing SystemsApr-29-2026, 23:47:51 GMT

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.

machine learning, natural language, sam-on, (18 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

5bd529d5b07b647a8863cf71e98d651a-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 20:53:22 GMT

action recognition, normalization parameter, recognition, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dynamic Normalization and Relay for Video Action Recognition

Neural Information Processing SystemsDec-24-2025, 04:18:29 GMT

Convolutional Neural Networks (CNNs) have been the dominant model for video action recognition. Due to the huge memory and compute demand, popular action recognition networks need to be trained with small batch sizes, which makes learning discriminative spatial-temporal representations for videos become a challenging problem. In this paper, we present Dynamic Normalization and Relay (DNR), an improved normalization design, to augment the spatial-temporal representation learning of any deep action recognition model, adapting to small batch size training settings. We observe that state-of-the-art action recognition networks usually apply the same normalization parameters to all video data, and ignore the dependencies of the estimated normalization parameters between neighboring frames (at the same layer) and between neighboring layers (with all frames of a video clip). Inspired by this, DNR introduces two dynamic normalization relay modules to explore the potentials of cross-temporal and cross-layer feature distribution dependencies for estimating accurate layer-wise normalization parameters. These two DNR modules are instantiated as a light-weight recurrent structure conditioned on the current input features, and the normalization parameters estimated from the neighboring frames based features at the same layer or from the whole video clip based features at the preceding layers. We first plug DNR into prevailing 2D CNN backbones and test its performance on public action recognition datasets including Kinetics and Something-Something. Experimental results show that DNR brings large performance improvements to the baselines, achieving over 4.4% absolute margins in top-1 accuracy without training bells and whistles.

dynamic normalization and relay, name change, normalization parameter, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Modulating early visual processing by language

Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C. Courville

Neural Information Processing SystemsNov-21-2025, 10:51:51 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

Normalization Layers Are All That Sharpness-Aware Minimization Needs Maximilian Müller University of Tübingen and Tübingen AI Center

Neural Information Processing SystemsNov-19-2025, 21:56:09 GMT

In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.

machine learning, natural language, sam-on, (18 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.86)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

5bd529d5b07b647a8863cf71e98d651a-Paper.pdf

Neural Information Processing SystemsAug-14-2025, 16:54:08 GMT

action recognition, dnr, recognition, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

Appendices for the Paper pFL Bench A Comprehensive Benchmark for Personalized Federated Learning

Neural Information Processing SystemsAug-14-2025, 07:49:15 GMT

Sec.A: the details of adopted datasets and models ( e.g., tasks, heterogeneous partitions, and FL processes as shown in Figure 4 in the main body of the paper. The CIF AR10 is a popular dataset for 10-class image classification containing 60,000 colored images with resolution of 32x32 pixels. The train/valid/test sets are with a ratio of about 3:1:5. The train/valid/test sets are with ratio about 14:3:3. For all these experimental datasets, we randomly select 20% clients as new clients that do not participate in the FL processes.

accuracy, dataset, un-participated client, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Find A Winning Sign: Sign Is All We Need to Win the Lottery

Oh, Junghun, Baik, Sungyong, Lee, Kyoung Mu

arXiv.org Artificial IntelligenceApr-9-2025

The Lottery Ticket Hypothesis (LTH) posits the existence of a sparse subnetwork (a.k.a. winning ticket) that can generalize comparably to its over-parameterized counterpart when trained from scratch. The common approach to finding a winning ticket is to preserve the original strong generalization through Iterative Pruning (IP) and transfer information useful for achieving the learned generalization by applying the resulting sparse mask to an untrained network. However, existing IP methods still struggle to generalize their observations beyond ad-hoc initialization and small-scale architectures or datasets, or they bypass these challenges by applying their mask to trained weights instead of initialized ones. In this paper, we demonstrate that the parameter sign configuration plays a crucial role in conveying useful information for generalization to any randomly initialized network. Through linear mode connectivity analysis, we observe that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with initialized normalization layer parameters. Interestingly, across various architectures and datasets, we observe that any randomly initialized network can be optimized to exhibit low error barriers along the linear path to the sparse network trained by our method by inheriting its sparsity and parameter sign information, potentially achieving performance comparable to the original. The code is available at https://github.com/JungHunOh/AWS\_ICLR2025.git

artificial intelligence, machine learning, subnetwork, (15 more...)

arXiv.org Artificial Intelligence

2504.05357

Country: North America (0.28)

Genre:

Contests & Prizes (1.00)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Gambling (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

HyperCLIP: Adapting Vision-Language models with Hypernetworks

Akinwande, Victor, Norouzzadeh, Mohammad Sadegh, Willmott, Devin, Bair, Anna, Ganesh, Madan Ravi, Kolter, J. Zico

arXiv.org Artificial IntelligenceDec-21-2024

Self-supervised vision-language models trained with contrastive objectives form the basis of current state-of-the-art methods in AI vision tasks. The success of these models is a direct consequence of the huge web-scale datasets used to train them, but they require correspondingly large vision components to properly learn powerful and general representations from such a broad data domain. This poses a challenge for deploying large vision-language models, especially in resourceconstrained environments. To address this, we propose an alternate vision-language architecture, called HyperCLIP, that uses a small image encoder along with a hypernetwork that dynamically adapts image encoder weights to each new set of text inputs. All three components of the model (hypernetwork, image encoder, and text encoder) are pre-trained jointly end-to-end, and with a trained HyperCLIP model, we can generate new zero-shot deployment-friendly image classifiers for any task with a single forward pass through the text encoder and hypernetwork. HyperCLIP increases the zero-shot accuracy of SigLIP trained models with small image encoders by up to 3% on ImageNet and 5% on CIFAR-100 with minimal training throughput overhead. A now-standard approach in deep learning for vision tasks is to first pre-train a model on web-scale data and then adapt this model for a specific task using little or no additional data. Despite the widespread success of these models and their lack of reliance on large-scale labeled datasets, a significant downside is that these models are often on the order of billions of parameters - much larger than their supervised counterparts for a given task at the same accuracy level. While these pre-trained models are powerful due to their generality, practitioners still need to apply them to well defined and specific tasks. We consider settings where there are additional constraints on the size of these models such as in edge computing applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.16777

Genre: